Relational Data Mining Using Probabilistic Relational Models
نویسندگان
چکیده
This thesis documents the design, implementation and test of Probabilistic Relational Models (PRMs). PRMs are a graphical statistical approach to modeling relational data using the Relational Language. PRMs consist of two components; the dependency structure and the parameters. Our design is based on simplicity, flexibility , and performance. We explain the search over possible structures, using a notation of potential parents. The potential parents are used in four different search algorithms; one greedy, two random, and a hybrid between greedy and random. Also, we explicitly explain learning the parameters using sufficient statistics, by considering internal and external dependencies and how to keep track of the context. We perform five different tests, showing that the penalty term of the score function can be tweaked to control the trade off between maximum likelihood and complexity. Also, our limited scalability test shows that our implementation scales near linear, although our implementation could be further optimized. The search test of the four algorithms show that introducing randomness is beneficial if combined with a greedy approach. The results also show, that although greedy finds the best model, the hybrid approach comes close in less time. In comparison with our prior work, it is clear that PRMs are a very good alternative to propositional data mining in terms of descrip-tiveness. Preface This thesis documents the design, implementation and test of Probabilistic Re-lational Models (PRMs). The thesis has been written during DAT6 (8. semester) at Aalborg University, Denmark. The reader should be familiar with our prior work [10]. A short summary of this, is provided for convince in the introductory parts of the thesis. A special thanks to Manfred Jaeger for his guidance and assistance during the project.
منابع مشابه
Multi-Relational Data Mining using Probabilistic Models Research Summary
We are often faced with the challenge of mining data represented in relational form. Unfortunately, most statistical learning methods work only with “flat” data representations. Thus, to apply these methods, we are forced to convert the data into a flat form, thereby not only losing its compact representation and structure but also potentially introducing statistical skew. These drawbacks sever...
متن کاملMining Frequent Patterns in Uncertain and Relational Data Streams using the Landmark Windows
Todays, in many modern applications, we search for frequent and repeating patterns in the analyzed data sets. In this search, we look for patterns that frequently appear in data set and mark them as frequent patterns to enable users to make decisions based on these discoveries. Most algorithms presented in the context of data stream mining and frequent pattern detection, work either on uncertai...
متن کاملProbabilistic Relational Model Benchmark Generation
The validation of any database mining methodology goes through an evaluation process where benchmarks availability is essential. In this paper, we aim to randomly generate relational database benchmarks that allow to check probabilistic dependencies among the attributes. We are particularly interested in Probabilistic Relational Models (PRMs), which extend Bayesian Networks (BNs) to a relationa...
متن کاملLearning Multi-Relational Semantics Using Neural-Embedding Models
Real-world entities (e.g., people and places) are often connected via relations, forming multirelational data. Modeling multi-relational data is important in many research areas, from natural language processing to biological data mining [6]. Prior work on multi-relational learning can be categorized into three categories: (1) statistical relational learning (SRL) [10], such as Markovlogic netw...
متن کاملRelational Sequence Learning
Sequential behavior and sequence learning is essential to intelligence. Often the elements of sequences exhibit an internal structure that can elegantly be represented using relational atoms. Applying traditional sequential learning techniques to such relational sequences requires either to ignore the internal structure or to put up with a combinatorial explosion in the model complexity. This c...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008